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(57) Abstract: A system and method is provided for detecting, trucking and/or blocking control signal attacks, which can occur 
2 between local computer system and/or between remote computer systems, network links, and/or routing systems over a computer 
^ network. The system includes a router monitor adapted to receive a plurality of control signals and related information from the com- 
puter network and to process the plurality of control signals and related information to detect one or more control signal anomalies. 
^ The router monitor is further adapter to generate a plurality of alert signals representing the one or more control signal anomalies. 



o 
O 



The system further includes a controller that is coupled to the router monitor and is adapted to receive the plurality of alert signals 
front the router monitor. The controller is constructed and arranged to respond to the plurality of alert signals by tracking attributes 
^ related to the one or more control signal anomalies lo at least one source, and to block the tine or more control signal anomalies using 
^ a filtering mechanism executed in close proximity to the at least one source. 
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" ™rrp a rOMTUTER NETWORK 

ttriated A pplication 

Field of the Invention 

particularly to network monitoring and control systems. 

Thrlrp- d of the invention 

Compu^rs^.nsa.often^connec.edbycompu.erne.worksfora.e purpose of 

beating ^formation. Co.pu.er systems ejected on sueh networks commute 

ffl er or copper casing, air, network "cadon «~ "* " ~J £ 
combirradon hereof using one or more communicadon pro.ocois sue, as TCP/IP, for examp. 
Networkscanbeorgantedintovarioustypesoftopologres. 

Fi8ure , u^onesuchtopoiogyUrarinduaesanerwo* , 00 havmg severa. 

Computer systems of eaeh .oca. area network are connected .0 —canons hnks ,01* 
1 When a source computer s y ,en, on a .oca, area n«work .0. or .02 sends 
„ ,o a desdnadon computer system on the same network ,0. or , 02, me source compute, system 
prepares a message (e.g., frame, packet, ceU, or the uke> mat inc.udes the address of the 
desdnadon computer system and transmits the message on the comma— hnk .0. 0 
,02a Other computer systems on ma, same, oca, area networa.O, or .02 (r.e, connected to 

• , • , ,nio nf inW reads the message that was transmitted. The 
the communications link 1 0 1 a or 1 02a) reaas & 

, ^ m ***** that its address is included in that message, and it 
30 destination computer system detects max ns duu 

processes the message accordingly. 
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Routing systems are generally used to couple one or more local networks to other 
networks (e.g., other public or private networks (e.g., the Internet, corporate network, etc.)). A 
routing system 103 is typically a dedicated special-purpose computer system to which each 
network 101, 102 is coupled, and routes information between these networks. The routing 
5 system 103 maintains routing information that identifies the location of other networks. In a 
TCP/IP routed network, for example, routing system 103 monitors packets sent on each 
network 101-102 to detect when a computer system on one network 101-102 is sending a 
packet to a computer system on another network (e.g., networks 101 or 102). When the routing 
system 103 detects such a packet, it forwards that packet onto the communications link 101a or 
! 10 102a for the network 101 or 102 to which the destination computer system is connected. In 
this way, the. routing system 103 interconnects networks 101 and 102 into an overall network 
100. Similar routing techniques may be used, for example, to interconnect local area networks 
(LANs), wide area networks (WANs), and the Internet 104. 

Routers make forwarding decisions based on local information stored in the router that 
15 identifies a next "hop" based on the destination of a packet. That is, the router generally 
forwards a packet out an interface to one or more other systems based on the destination 
address of the packet. 

Routers communicate among each other to share information regarding the networks to 
which they are connected. This communication causes routers to update their local databases 

20 with this communicated information. Generally, routers maintain routing tables that store 

entries regarding the networks to which the .router can communicate. Communication between 
routers is performed according to a method referred to in the art as a routing protocol. There 
are many different types of routing protocols used for sharing routing information among 
computer systems. For the TCP/IP protocol, for example, there are numerous routing protocols 

25 including Border Gateway Protocol (BGP), Routing Information Protocol (RIP), Open Shortest 
Path First (OSPF), Interior Gateway Routing Protocol (IGRP) and Enhanced Interior Gateway 
Routing Protocol (EIGRP) and others. An organization may implement one or more routing 
protocols within any network. 

Due to the scale of communication networks such as the Internet, there are two types of 

30 routing protocols, intradomain and interdomain protocols, referred to in the art as Interior 

Gateway Protocols (IGPs) and Exterior Gateway Protocols (EGPs), respectively. Intradomain 
routing protocols are generally run in networks that are limited in scope and have a single 
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^arauve domains, where network entifies of different organic share .fomra.on 

5 Lhnl^arereaehah.edrrongh.eprov.er.n^.mp— proved 
^ehr.metuMayisureBorderOatewayProtocoKBGP). BGP may be used as either an 
intradomato or interdomarn routing protocol. 

Figu re 2 shows a funotiona, b,ock diagrant of a network 1 01 that mchrdes a number of 
networks <e.,, .SP networks) coup,ed together by a computer network. Each o, ^e ISPs can 

togedteroverielntenret. The wide area network 101 of Figure 2 fiufirer shows .number* 

* in?, inland 105c that are adapted to receive information 
a number of personal computers 105a, 105b ana ii»c ma 

network shown in Figure 2. A 
Sarver 106 can include a nuntber of computer subsystems ,06a, ,06b, ,06c, ,06d and 
,06e aa we,, as associated databases (no, show.). For example, server 106 may inch* for 
e x a mple,a gro upofservers(e.g.,serverfarm)configured.o re spond,ore q ues K for 

„ LlauolLoveranetwork. Thecoma subsystems ,06, ,06b ,106c ,0 d and 
" ^maybcforcamplcwebserversofawebpagehostingsiteutatareadap^msmre^d 

to the web customers 105 over the various ISPs. 

,„ one specific example, a personal computer 105a of the web customers 105 can 
25 communicate a revest «o a web page hosting compute, (server 1 06, for example) for 
rcuestingaparticularwehpagelravingprndeterminedconten.. Tbe request can be 
commu ri c. tt d ft omfi,epe^na,c„mpmer,05a,o*.webpagehosfin g compu,er,06overa 

number of different paths within network 101 (e.g, by path ABCD or pad, ABEFG). 
The choice of path is determined automatically through the exchange of signahng 
' ,o information between networking devices a,o„g aU paths traversing some combinaUon of 

ABCDEFG Network devices in ISP, and 1SP2 signal availability of a path to sue 106 to 
each netghbor network, ,SP3 and ISP4. This signalling may occur, for exampie, according* a 
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routing protocol that defines methods by which network devices determine how to forward 
information. More particularly, ISP3 and ISP4 and other network devices of network 101 
communicate routing protocol information amongst each other indicating their knowledge of 
the network. 

5 Once the paths are established, messages (e.g., web page requests from customers in 

105) can begin to travel to the servers in 106. In the path ABCD, ISP7 initially receives the 
request for the web page from the personal computer 105a and forwards the request to ISP3. 
Similarly, ISP3 for-wards the request for the web page to ISP I, which ultimately forwards the 
request to the web page hosting computer 106. Each of the ISP's also send control signals to 
•10 the other ISPs. /The control signals, includes among other things, information related to a 
return path from the site 106 to the requesting computer. 

Referring further to Figure 3, one problem occurs when an attacker computer system 107 
of ISP5, for example, maliciously sends ISP4 erroneous or deceptive control signals. The 
deceptive control signals can include information indicating that ISPS has the most efficient 

15 access to the web page hosting computer 106. In this example, the deceptive control signals 
would be communicated back to the personal computer 105a over the data path HIJK. In this 
instance, the personal computer's 105a request for the web page would not be received by the 
web page hosting site 106, because the request may be actually redirected to, for example, an 
attacker computer system 107 of ISP5 or to an incorrect destination. This may result in 

20 reduced access to the web page hosting computer 106, which can result in reduced business, 
lost sales and/or a general theft of service that the web page hosting site 106 would otherwise 
realize. This scenario is one type of what is referred to in the art as a Denial of Service (DoS) 
attack. 

Conventional routing systems 103 (Figure 1) have attempted to avoid erroneous control 
25 signal attacks, as described above, by employing various types of control signal encryption 
techniques to validate the integrity of the source of control signals. These control signals 
encryption techniques require that a number of public keys be distributed among the various 
ISPs for which the keys can be processed with other information residing on the various ISPs to 
encode and decode the control signals. However, this technique has not yet been implemented 
30 because of the complexity and associated costs related to the hardware and software necessary 
to encode and decode the control signals. 
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Other conventional routing systems 103 have attempted no avoid deceptive con.ro. 

If me ISPs are reared to subscribe ,0 me BR and provide details of dtetr poltcv and 
customer topology informal. Using mis topology information, providers can generate a 
number of access confro. lis. ("ACLs"). the ACLs generally deserihe ad 

particular ISP, which receives a control signal from another ISP can venfy wmrthe 
L particular ISP should accept or reject the control signal. However this ,echnt,nets hmrfcd 
^use i. quires all the various ISPs of me wide area network .0 snhscrthe to the WX There 
has only been limited acceptance of the IRR to date and fterefore, limited effectiveness. 

„ addirion to security concerns, another problem stems from the rate and volume of 
topology signaling informauon exchanged bertveeu ISP, The volume and rate of change tn 

anddebuggingchaUenges. As the signaling communication is automateo, changes » network 
pahs in response to failures or policy changes may occur without the knowledge or 
intervention of network operators. 



s,,inmflr Y of th* Invention 

According to one aspect of the invention, a method is pmvided for monitoring control 
,„ ^traffieoveracomputernetworkcomprisingaplumli^ofne^rkcommunication 

systems by a computer system, the memod compnsing acts of receiving, from at leas, one of 
the plurality of network communication systems, a, least one control signal eommnntcated ,o 
one or more other network communication systems; and storing the at leas, one eontro. stgna, 
i„ a database of me computer system. According to one embodiment of .he invenuon, the a, 
25 ioastoneconnmsignaUontrolsforwardingofdatainthecomputernetwork. 

embodiment of the invention, the at leas, one control signa. is a route entry stored in a memory 
of the at least one of the plurality of network communication systems. Accordtngto one 
embodiment of the invention, the a, leas, one control signal is a route update transmitted by the 
at leas, one of the plurality of network communion sys.ems. According to one embodtmen. 
30 of the invention, me method .Whet comprises an ac. of defining, based on the a, least one 
control signal, an anomaly in the computer network. 
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According to one embodiment of the invention, the method further comprises an act of 
generating an alert signal based on the determined anomaly. According to one embodiment of 
the invention, the act of storing further comprises storing a plurality of control signals over 
time. According to one embodiment of the invention, the method further comprises an act of 

5 performing, in response to the act of determining the anomaly, an administrative act in the 
computer network. According to one embodiment of the invention, the anomaly includes one 
or more attributes and the method further comprises an act of tracking the one or more 
attributes of the anomaly to at least one source. According to one embodiment of the 
invention, the method further comprises an act of filtering a control signal produced by the at 
• 10 least one soured that relates to the anomaly. According to one embodiment of the invention, 
the act of filtering is performed in one of the plurality of network communication systems. 
According to one embodiment of the invention, the one of the plurality of network 
communication systems is a router. According to one embodiment of the invention, the 
method further comprises an act of creating a filter in the router to filter control data 

1 5 transmitted by the at least one source. 

According to another aspect of the invention, an apparatus is provided for monitoring 
control signal traffic over a computer network comprising a plurality of network 
communication systems. The apparatus comprises a monitor that receives, from at least one of 
the plurality of communications systems, at least one control signal communicated to one or 

20 more other network communications systems and which stores the at least one control signal in 
a database. According to one embodiment of the invention, the apparatus further comprises a 
controller that receives, from the monitor, the at least one control signal and stores the at least 
one control signal in the database. According to one embodiment of the invention, the monitor 
stores the at least one control signal in a persistent archive. According to one embodiment of 

25 the invention, the apparatus further comprises a detector that detects an anomaly based on the 
at least one control signal. According to one embodiment of the invention, the apparatus 
further comprises a profiler that generates a profile of at least one of network communication 
trends in the computer network and topology of the computer network. According to one 
embodiment of the invention, the apparatus further comprises a controller that is adapted to 

30 receive the detected anomaly from the detector, and is adapted to communicate the anomaly in 
an alert message. 



03033?* OA? > > 



PCT/US02/20559 

WO 03/003210 



-7- 



tat U encoded with scions for execution on a computor sys«em, the — - when 
executed, perform a method uprising acts of receiving from a, .east one of toe plummy o, 

other network communication sysmtns, and storing «he a, leas, one con.ro, stgnal .» a database 
of .he computer system. According to one enrbodimen, of dreinvendon, .he a, leas, one 
control signal controls forwarding of da* in me computer network According to one 
emhodinren, of the invenUon, the a. leas, one control signal is a route entry stored ,n a memory 
of «he a. leas, one of the phtrality o, network According to one embodiment of.be mventton, 

^compnse S anac,ofde^ & b^o»n tt a..ea S .oneeon« 1 dgna 1 ,ananoma.ym 

the computer network. 

According ,o one embodiment of .be hwenfion, tt,e memod fiuther cumprtses an act of 

„ genera.inganalertsignalbasedonu.ede.ennhredanomaiy. According to one embodimen, of 
fte invemion, ft. ac, of storing further comprise, storing a ptaraUty of control signa!s over 
.hue According to one embodiment of the invention, the me.hod further comprises an ac, of 
performing, in response to the ac, of determining the anomaly, an admimsdanve ac. ,n .he 
computer network. Aceording to one embodimen, of me invention, me anomaly mdudea one 

2 0 ormoma^butosandftemeftodftnmercomprisesanac.ofuaoking.heoneormore 

attributosoftheanomalytoatleastonesource. Accordingtooneembodimen.of.be 
invention, d» method tamer comprise an act of filtering a control signal produced by me a. 
^onesourcetha, mla.esto.be anoma.y. According. o one embodiment of the invendon,. be 
ac, of filtering is perfom,ed in one of the pluralHy of network communication systems. 
„ According to one embodimen. of me invention, the one of the plurality of network 

communication systems is a router. According to one embodimen, of the invennon, the 
method further comprises an ac, of creating a fitter in toe router to filter control da* 



transmitted by ,he at least one source. 

Further features and advantages of fte present invennon as well as the stntcture and 
option of various embodtmems of ,he presen, invenfion are described in detai, Wow with 
reference to me accompanying drawings. In toe drawings, like reference numerals ind,ca,e hke 
or functionally similar elements. 
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Brief Description of the Drawings 
The invention is pointed out with particularity in the appended claims. The above and 
further advantages of this invention may be better understood by referring to the following 
5 description when taken in conjunction with the accompanying drawings in which similar 
reference numbers indicate the same or similar elements. 
In the drawings, 

Figure 1 is a high level block diagram of a conventional networked computer system; 
Figure 2 is a high level block diagram of a conventional network computer system 
! 1 0 including a plurality of Internet Service Providers each of which includes a plurality of network 
computer systems; 

Figure 3 is a high level block diagram of the conventional network computer system of 
Figure 2 further showing misdirected traffic as a result of Denial of Service (DoS) attack; 

Figure 4 is a high level block diagram of a computer network system according to one 
1 5 embodiment of the present invention; 

Figure 5 is a block diagram of a system according to one embodiment of the invention 
that shows a partially expanded view of the computer network system shown in Figure 4; 

Figure 6 is a high level block diagram of a router monitor according to one embodiment 
of the invention; 

20 Figure 7 is a high level block diagram of a controller according to one embodiment of 

the invention; 

Figure 8 is a high level block diagram of one aspect of the invention that shows a 
malicious control signal attack; and 

Figure 9 is a block diagram of a refiner according to one embodiment of the present 
25 invention. 



Detailed Description 

Network engineers often have no method of knowing the previous state of the network 
topology, or what changes in the topology occurred. This lack of instrumentation poses a 
30 challenge for engineers in debugging customer complaints, such as recent periods of poor web 
connectivity. 
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Toadd^prob.en.sas.ooia.C^^gsi^ginfom-on.an^of 

interlaces xo men networking devices in each 

provider's network. These maoagemeot rnterfacos, however, onlj p 

interfaces do not provide historical information. 

^ onisincotpo^intothetou^fc— 

Therefore, an unsolved need remains for a system ana me 

^ — --o^.-hep^.— asv*,— is 

Lking arrd b,oc,c,n g coo.ro, signa, anonrahes rourer - 
25 loJ.oca.oos of a —ions networ, More genera.,, a svs-tem and me. od rs 

Lrro. informal mav be, for e*amp,e, roorhrg .n— n—d and »— 
by routurg svaem, A,so, «. con.ro. infornradon nra y he used ,0 perform route cause srs 
o router fadures, Moreover, such a svsten, ma, nrouhor routing cnnfrgurahon an cou, 
30 re po rt o„ano m a,ies,hatooourind,erou,edne W o*. TneaeanonraU ^ 
cause an interruption or degradation of service. For example, ., may be useftr, ,o de.ec, 
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time, whether routes are invalid, whether there are abrupt shifts in topology, or the like. By 
monitoring and storing control information, the ability to detect such anomalies is possible. 

For purposes of illustration and to facilitate a further understanding of various aspects of 
the present invention, various embodiments of the invention may be employed in an Internet- 

5 based computer network system. However, as understood by one skilled in the art, the present 
invention is not limited to Internet-based systems and can include systems employing other 
computer networks (e.g., public and/or private networks) and/or stand-alone systems. 

Referring to Figure 4, in one embodiment of the present invention a system 5 for 
management of network signaling information and for monitoring, detecting, tracking and/or 

10 blocking control signal anomalies communicated between routers located in various locations 
of a network is incorporated in a network computer system 10. Network 10 may be, for 
example, a wide area network (WAN) connecting Internet Service Provider (ISP) networks, but 
it should be appreciated that various aspects of the invention may be implemented on any type 
of network. The invention is not limited to any particular network configuration, routing 

1 5 protocol, or routing system. 

System 5 can be located on a single server computer, which is in communication with 
components of computer system 10 or distributed over a plurality of server computers, which 
are also in communication with components of the network computer system 10. System 5 
may be implemented, for example, as hardware and/or software that implements one or more 

20 functions associated with various aspects of the invention. 

Various embodiments according to the invention may be implemented on one or more 
computer systems. These computer systems, may be, for example, general-purpose computers 
such as those based on Intel PENTIUM-type processor, Motorola PowerPC, Sun UltraSPARC, . 
Hewlett-Packard PA-RISC processors, or any other type of processor. It should be appreciated 

25 that one or more of any type computer system may be used to implement various aspects of the 
invention. Further, various aspects of the invention may be implemented on a single computer 
or may be distributed among a plurality of computers attached by a communications network. 

For example, system 5 may be implemented as specialized software executing in a 
general-purpose computer system (not shown). A general-purpose computer system may 

30 include a processor connected to one or more memory devices, such as a disk drive, memory, 
or other device for storing data as is known in the art. Memory is typically used for storing 
programs and data during operation of the computer system. Devices located within the 
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• ^vice such as a network (e.g., a system 
bus). Theco p * m addition, ^computer system may contain one or 

rrss:- : — — 

r e d network 18 or any network of network system 10, for example). ■ 
(e.g. network lb, or any compu ter system that is programmable 

The computer system may be a general-purpose compu y 

• i m ,oaP The computer system may be also 
nuroose computer system, the processor is xypiwo. j 

purpose oumpu Corporation. Many other 

1 5 Microsystems, or UNIX operating sy:>icii 

. cTunicafioLe^ ^eco^s^s^auote^™-^ 
systems^ ^ ^ ^ ^ ^ ^ ^ , ^ 

Lealatthepresentinvention is n„« * a specific prog~ ,W 
compute, system and that Char approve pmg— ianguages and other approve 
computer systems could also be used. 

Tb. network computer system 10 may include a pluraltty of networks e.g ISP 
30 networks .4a 14b and 14c, that are coupied together over a computer network IS v., one or 
30 networkslda,! _ TheIsps 14 , 14band 14c can a!so be coupled directiy to ench 
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can include a plurality of computer network zones. These zones may correspond, for example, 
to routing domains serviced by a respective ISP. As shown in Figure 4, the ISP 14a includes 
computer network Zone X, Zone Y and Zone Z. The ISP 14b includes computer network Zone 
U and Zone V. The ISP 14c includes computer network Zone W. 

5 Figure 5 shows a partially expanded view of system 5 according to one embodiment of 

the invention. In Figure 5, Zone X of the ISP 14a includes a number of networks coupled to a 
central routing system 22. Each network is coupled to a plurality of computer systems 1 6a, 
16b, 16c, 16e, 16f, 16g, 16h, 16i and 16j (hereinafter collectively referred to as "computer 
system(s) 16"). The computer network Zones Y and Z, which are also located on the ISP 14a, 
! 10 can be similarly constructed and arranged as computer network Zone X. Further, the computer 
network Zone£ U and V, which are located on the ISP 14b and the computer network Zone W, 
which is located on the ISP 14c, can also be similarly constructed and arranged as computer 
network Zone X. ISP 14a may be coupled to other ISPs (e.g., ISPs 14b, 14c) by a 
communications network 1 8. Network 1 8 may also include one or more network 

15 communication systems (e.g., routers) coupled to one or more network communication systems 
(e.g., routing systems 22, 22b, 22c) for the purpose of transferring user data and control 
information between the ISP networks. 

The system 5 includes a router monitor 20, an optional router monitor 20b and a zone 
controller 24. Router monitor 20 monitors control information associated with one or more 

20 routers. In Zone X, the router monitor 20 is coupled to the central routing system 22. The 

router monitor 20 is further coupled to a zone controller 24, which provides a primary interface 
to Zone X of the ISP 1 4a and processes one or more messages received from one or more 
router monitors. Zone controller 24 may be configured to store, in a database, control 
information received from one or more router monitors. 

25 In another embodiment of the invention, the router monitor 20 can be coupled to one or 

more other router systems, such as routing system 22b ? as shown in Figure 5. In addition, the 
zone controller 24 can be coupled to one or more other router monitors, such as router monitor 
20b, also shown in Figure 5. Further, the router monitor 20b, can be coupled to one or more 
other routing systems, such as the routing system 22c. Although various aspects of the 

30 invention are described in terms of routers, it should be appreciated that any network 

communication system may be monitored, and the invention is not limited to monitoring 
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routersorcontrolinformationrelatingtorouters. For example, switching systems and their 

control information may be monitored. 

The zone controller 24 located in Zone X of the ISP 14a provides a primary interface to 
the computer network Zone Y and to the computer network Zone Z, which are both located on 
5 the ISP 14a The zone controller 24 further provides a primary interface to the computer 

network Zone U and the computer network Zone V, which are located on the ISP 14b, over the 
computer network 1 8 . Similarly, the zone controller 24 further provides a primary interface to 
computer network Zone W, which is located on the ISP 14c, over computer network 18. 

In one embodiment of the present invention, computer systems 16 located in computer 
10- network Zone X of the ISP 14a can each comprise a conventional computer server such as an 
"NT-Server" which can be provided by Microsoft of Richmond, Washington or a "Unix Solans 
Server" which can be provided by Sun Micro Systems of Palo Alto, California. These 
computer systems 16 can be programmed with conventional Web-page interface software such 
as: "Visual Basic", "Java", "JavaScript", "HTML/DHTML", "C++", "J++", "Perl" or 
15 "Perlscript", "ASP", "C#", or any other programming language. These computer systems can 
further be programmed with an operating system, Web server software, Web Application 
software, such as an e-commerce application and computer network interface software, or other 
software that allows them to exchange information. 

Each of the routing systems 22, 22b and 22c, shown in Figure 4, can be a conventional 
20 router, such as a "Cisco 12000" router, available from Cisco Corporation of San Jose, 
California or "M-series" or "T-series" routers available from Juniper Networks, Inc. of 
Sunnyvale, California or any other router or network communication device that exchanges 
topology and/or data forwarding control information (e.g., switches). The routers are 
configured to gather and store a plurality of control signals and related information by, for 
25 example, communicating using a routing protocol such as BGP. Further, each of these routing 
systems can be adapted to forward control signaling information to the collector computing 
system 5. The plurality of control signals and related information can include a data path 
description through various ISPs 14a, 14b and/or 14b of a wide area network for enabling a 
web farm or hosting computer system to provide web pages or other information to a 
30 requesting computer system via the various ISPs 14a, 14b and/or 14b. 

Figure 6 shows a router monitor 20 according to one embodiment of the invention that 
collects and processes control information. Router monitor 20 includes a collector 20a coupled 
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to the routing system 22. The collector 20a is coupled to a persistent archive database 20b, a 
detector 20c and a profiler 20d. The detector 20c is coupled to a local controller 20f. The 
profiler 20d is coupled to database 20g, as well as to the local controller 20f. The local 
controller 20 is coupled to a refiner 20h, which is coupled to the persistent archive database 
5 20b. The local controller 20f is further coupled to the zone controller 24. 

The router monitor 20 is adapted to receive the plurality of control signals and related 
information from the routing system 22 and to process the control signals and related 
information to support network engineering and management functions as well as detect 
control signal anomalies. The router monitor 22b of Zone X, as well as other various router 
! 10 monitors (not shown), which are included in the other various Zones U, V, W, Y and Z may be 
similarly constructed and arranged as the router monitor 20 of Zone X. 

In another embodiment of the system, the router monitors may be deployed in a 
multiplicity of remote network locations, such as in networks, ISP1, ISP2 and ISP3. The 
remote monitors may be configured, for example, by a controller to detect changes in routing 
15 information, for example, information related to network A announced by ISP7. Examples of 
remotely monitored information might include changes to the path, or detail of the signal 
describing network A in ISP 1, ISP2 and ISP3. For example, the monitored information may 
include routing update messages according to one or more routing protocols. 

20 Collector 

More specifically, the collector 20a of the router monitor 20 is adapted to receive the 
plurality of control signals and related control signal information from the routing systems 22, 
22b and/or 22c. The collector 20a is further adapted to normalize or statistically categorize the 
control signals and related control signal information to generate a number of records. The 

25 collector 20a provides a copy of the record to the detector 20c and also stores a copy of the 
record in the persistent archive 20b. 

According to one embodiment of the invention, collector 20a may compute statistical 
information regarding data it has collected, and may infer certain information. For example, 
collector 20a may continuously calculate statistical information about the routing information it 

30 receives on its peering sessions. Collector 20a may track both explicit and calculated or 

implicit information. Explicit statistics may be calculated directly from the routing table stored 
in a router (e.g., BGP routing table) and the routing updates (e.g., BGP update messages) 
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received. In the case of BGP, implicit information may include several types of BGP routing 
table changes that are identified based on the current routing table state when a BGP update is 
received. 

Each of these statistics may be tracked separately, for example, on a network-wide, per- 
5 router, and per-inferred-peer basis. Standard statistical measures (e.g., sum, mean, median, 
standard deviation, local niinima/maxima) may be calculated over a variety of time periods (for 
example, five minute periods) and each sample is stored along with the time it represents in the 
database for later retrieval and analysis. These statistics may be used, for example, by a query 
facility in response to queries, and/or by a statistical modeling engine to build a statistical . 
10 model of network behavior (as described in more detail below). 

Explicit information that collector 20a may track may include, for example: 
number of routes (average) 

number of unique AS (Autonomous System) Paths (average) 
number of BGP updates (sum) 
1 5 number of BGP announcements (sum) 

number of BGP withdrawals (sum) 

number of times each BGP peering session goes down (sum) 
number of times each BGP peering session comes up (sum) 
number of unique ASes in the routing table (average) 
20 number of unique origin ASes in the routing table (average) 

number of BGP communities (average) 
System start and stop 

Implicit information that may be monitored may include, for example: 
25 number of AADup 

number of WWDup 

number of AADiff 

number of TDown 

number of Tup 
30 probability of ASPath adjacencies 

probability of path selection 

probability of origin AS prefix origination 
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probability of peer router prefix origination 
probability of path stability 

These implicit events are well-known in BGP routing. Briefly, they include: 
5 AADup Duplicate Announcement - An announcement for a route identical to 

one that already exists in the routing table; the announcement must be 
identical in all attributes to the existing route, otherwise it is an AADiffi 
WWDup Duplicate Withdrawal - A withdrawal for a route that has already been 

withdrawn from the routing table. 
1 0 AADiff Implicit Change - A route is announced for a prefix which already has 

an existing route. However, the new route is different in one or more 
attributes than the existing route. 
TUp Transition UP - a route comes up (is added to the routing table). This 

does not include attribute changes to an existing route. 
15 TDown Transition Down - a route goes down (is removed from the routing 

table) 

In accordance with one embodiment of the invention, collector 20a maintains BGP 
peering sessions with BGP routers to obtain information. In one embodiment of the invention, 
20 peering sessions are passive peering sessions. That is, collector 20a does not propagate any 
BGP state or routing information to other routers. The collector 20a does, according to one 
embodiment of the invention, send BGP Keepalive messages to its peers to maintain the 
peering sessions. 

One advantage of having a totally passive approach is that the collector 20a does not 
25 change the state of the network it is monitoring. More particularly, according to one 

embodiment, collector 20a does not propagate BGP updates or state changes, which makes it a 
low-impact method for monitoring network routing. 

* 

According to one embodiment of the invention, collector 20a has the ability to 
understand and process routing messages. For instance, collector 20a may employ a BGP 
30 routing protocol, and as discussed above, collector 20a may be capable of maintaining BGP 
peering sessions with one or more BGP routers. Collector 20a may be capable of receiving 
BGP updates from network routers over its peering sessions and may store them in a database 
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(e.g.. persistent archive 20b and/or database 20g, for example). Collector 20a may collect a 
wide range of state and topology information from routers, and collector 20a is not limited to 
collection of any type of information. The invention is not implemented to any particular data 
type or method for collecting and/or storing such information. 
5 According to one aspect of the invention, data may be timestamped and stored in a 

database (e.g., in persistent archive 20b). Such data may include, for example, routing state 
changes and topology updates received by collector 20a. In the case of the BGP routing 
protocol, data stored may include BGP attributes and the source of the change. For example, 
such BGP and system state information that may be tracked and stored in the database may 

l o include, for example: 

peering session up 
peering session down 
peering session errors 
administrative/configuration changes 

15 user-specified events 

system start 
system stop 

This data may be timestamped, for example, to the nearest second and stored in the 
database. Also, according to another aspect of the invention, a query facility may be provided 
20 that allows a user (e.g., administrator or other system) to query the data stored in the database. 
This may allow a user to determine, at a particular point in time, the routing state of the 
network. This may be beneficial, for example, in performing root cause analysis of a network 
problem. 

Data collected by monitor 20 may, according to one embodiment of the invention, be 
25 accessed using a query facility that allows a user to obtain data from one or more databases 
associated with monitor 20. For example, a user may be allowed to query current and past 
routing changes and routing state. 



Routing Database 

30 As discussed, data collected by monitor 20 may be stored in a database (e.g., archive 

20b, database 20g, or other database) that stores routing state information. Information 
collected by the collector 20a may be stored, indexed, and retrieved in response to queries from 
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the query facility. The database may also store statistical information about the observed 
routing topology and state changes. This statistical information may be generated, for example, 
by routing collector 20a. 

The database stores three important types of information generated by the collector 20a. 
5 It includes a change store, which stores a time stamped, sequential archive of BGP updates. 
Updates may be tagged with indices to simplify later queries, including the timestamp of the 
update (with second granularity, for example) and the source of the information. The change 
store may also include other event information including BGP state changes, synchronization 
messages, and system-level events. Synchronization messages are internally generated, time 
' 10 stamped messages that uniquely identify a snapshot (described in more detail below) 

corresponding to that position in the sequential archive of changes. Updates and messages in 
the change store may be stored, for example, in the order in which they were received by 
collector 20a. 

Snapshots are the second type of information stored by the database. A snapshot is a 
15 complete dump of the global routing table state at a given moment in time. A timestamp may 
also be stored with each snapshot. The timestamp, for example, indicates the exact time the 
snapshot was taken. 

The routing database may also store statistical information calculated by the collector 
20a as described above. 

20 According to one embodiment of the invention, a system is provided that may use the 

above types of information to reconstruct the routing table state at arbitrary times in the past. 
By querying the statistical information, it can also show the history of instability in the routing 
table state over time, and can be used to identify (or at least narrow a potential set of) the 
causes of that instability. This is a major innovation over the capabilities of previous systems 

25 used to manage routed networks. 

According to one embodiment of the invention, a databse is provided that efficiently 
indexes and stores routing information for retrieval. More particularly, a database according to 
one embodiment of the invention allows for fast and easy searches and quick and efficient 
pruning of old or unwanted data. All database records may, for example, be time-stamped to 

30 within one second granularity, and may be stored in sequential order based on the timestamp. 
Although particular database storage methods and formats are described herein, the present 
invention is not limited to any particular implementation. The following database architectures 
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described below are one implementation of a routing database according to various 
embodiments of the invention. 

Change Store Architecture 

5 " The change store may be represented by multiple flat binary files. Each file contains 15 
minutes worth of change records, and each 15 minute period is aligned on quarter-hour 
boundaries based on the wall-clock time of the system. (For example, one file might contain 
data starting for the time interval 16:00:00 September 12, 2001 through 16:14:59 September 
12, 2001. The next file in time order would run from 16:15:00 September 12, 2001 through 

10 16:29:59 September 12, 2001, and so on.) 

Eachfile is named based on the start time of the fifteen minute period it covers. This 
implies that no separate database index system is needed in order to find a given record. To 
locate change events that happened at a particular time, the system simply opens the file named 
with the start time of the fifteen minute period of interest. 



15 



Sna pshot Store Architecture 

The snapshot store may be kept similarly to the change store. A complete global 
routing table snapshot may be stored, for example, in a single flat file. One snapshot is taken 
every four hours, on the hour, and the file is named with the time of the snapshot. It is 
20 therefore possible to establish exactly where in the sequential change store a given snapshot 
falls, based on the timestamps of the snapshot and the change records. 

Statistics Store Architecture 

The Statistics Store keeps data slightly differently than the other two stores, because it 
25 stores a large number of samples for a set of interesting statistics categories. As mentioned 
above, statistics information may be aggregated (either by summing or averaging, for example) 
into a single number representing a five-minute sample for that statistic, which is then stored in 

the routing database as follows. 

All statistics may be stored, for example, in a set of flat files, each file containing all of 
30 the samples from a given time period. Each file includes a set of records, one for each type of 
statistic kept. Each record contains a numerical ID, corresponding to a well-known constant 
representing one of the types of statistics, the set of sample values for that file's time period for 
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that statistic, and for efficiency a separate entry for the maximum observed sample value for 
that time period. Because all of the records in a file contain the same number of samples, all 
records are the same size. Each file also contains a header, which contains the start and stop 
timestamp of the samples contained in the file. Each file is named by the timestamp of the 

5 time period its samples cover. 

Because each record in a file is of fixed length, it is possible to deterministically read 
the samples for any data type for any time period simply by opening the file covering that time 
period and calculating the offset in the file where the record is located. Because each record in 
a file is the same size, this calculation can be easily performed by multiplying the numerical id 

10 of the statistic type to be read by the size of each record in the file. This allows a lookup of the 
sample for a given time for any type of statistical information stored in the database, as well as 
for the maximum value observed over the given time period for each statistic. 

Database Pruning 

To reduce the disk storage requirement of the routing database, the routing database 
may be periodically pruned to remove old and unwanted information. Periodically, all data 
older than a configurable time interval (default is six months) may be deleted, for example. 
The system may prune by first selectively removing state table dump files beyond a certain 
timeframe. Because only one complete state table file (or system/peer start) is used for state 
synchronization, pruning according to one embodiment of the invention trades off data query 
speed (which is dominated by state synchronization time) with data storage requirements. 

Because of the efficient database architecture, removing this data is as easy as deleting 
all files with names that correspond to times older than the configured time interval. The 
database is also pruned if it runs out of disk space. In this case, the oldest files are deleted until 
there is sufficient disk space to store new data. 

Statistics Store Data Compression 

Due to the large amount of data that can possibly be generated each day by collector 
20a, it may be desired to compress data over time so that a storage device associated with 
30 monitor 20 does not become full. This may be performed, for example, by using industry- 
standard round-robin database techniques. Specifically, the system may aggregate older 5- 
minute samples into samples covering larger time periods. Depending on disk size, the routing 
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database can keep a known (and configurable) number of days worth of unaggregated 5-minute 
samples. In one embodiment of the invention, each day's worth of samples are kept in a 
separate file and named with the timestamp of the day the file covers. 

Samples older than the number of days to be kept are aggregated together and stored on 
5 a weekly basis. To aggregate, six (6) 5-minute samples may be averaged together to yield one 
week sample of 30-minutes. If desired, the ratio of day to week samples can be configured to 
trade offdisk usage vs. sample granularity. A week's worth of 30-minute samples are then 
stored in a single file covering one week, which is again named based on the time period that it 
covers. As with days, a configurable number of week time periods can be stored before the 
10 system aggregaies the samples into the next time period, which is monthly. The maximum 5- 
minute value of the days covered by each week file is also stored for each record, allowing the 
system to later determine the peak value for each statistic. 

The sample aggregation continues in a similar fashion through two more aggregation 
levels. The next aggregation is weekly to monthly. By default, eight (8) 30-minute week 
15 samples are averaged together to form a 4-hour month sample, and a month's worth of these 
samples are stored in a file that is again named by the time period covered, for example. As 
before, the ratio of week to month samples is configurable, the number of months to be stored 
before aggregating further is also configurable, and the maximum of the maximum 5-minute 
samples from the weeks covered is preserved for each record. 
20 The next and final aggregation level is yearly, in which (by default) 12 month samples 

are averaged together to make one 2-day year sample, and a year's worth of samples is stored 
for each record in a file named after the time period covered. This aggregation level is 
configurable as before, and other information is carried over as with previous aggregation 
levels. 

25 It should be appreciated that data may be collected, stored, and aggregated using 

different methods, and the invention is not limited to any particular method. 

Detector 

Detector 20c is adapted to detect the control signal anomalies by comparing the records 
30 to an anomaly pattern, predetermined thresholds and/or statistical models of prior control 
traffic and signaled topology. If components of the records and related control signal 
information exceed the predetermined threshold or statistical models, a control signal anomaly 
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is detected. Thereafter, the detected control signal anomaly and related control signal 
information can be stored in the database 20g. An example of the control signal anomaly and 
related control signal information, which is stored in the database 20g 5 can include a 
description of the data path through various ISPs 14a, 14b and/or 14c of the network computer 
5 system 1 0 for which a hosting computer system (not shown) provides Internet web pages or 
other information to a requesting computer system (not shown). 

Profiler 

The profiler 20d is also adapted to receive the records and related control signal 
• 10 information from the collector 20a and to process the records and related control signal 

information to generate statistical models of signaled topology and the signal traffic thresholds, 
which are concomitantly communicated to the detector module 20c. In this configuration, the 
thresholds and statistical models calculated in the detector 20c are adaptively adjusted based on 
changing trends or profiles of the records and related control signal information received by the 
15 profiler 20d. The changing trends or profiles of the records and related control signal 

information, for example, can include changes in connection configurations of the various ISPs 
14a, 14b and/or 14c; changes in the computer systems 16, which subscribe to the various ISPs 
14a 5 14b and/or 14c or changes in the data paths previously employed for communicating 
information between computer systems 16 over ISPs 14a, 14b and/or 14c of the network 
20 computer system 1 0. 

Local Controller 

Local controller 20f> which is coupled to both the detector 20c and to the profiler 20d, is 
adapted to receive the control signal anomalies from the detector 20c, as well as the related 

25 control signal information, as previously described. After receiving the control signal 

anomalies and the related control signal information, the local controller 20f generates a signal 
or an alert message. The alert message can include pertinent information related to the control 
signal anomaly. The pertinent information related to the anomaly can include the 
characteristics of the anomaly, path attributes associated with the control signals, the source 

30 and destination of the anomaly, the detection mechanism used to identify the anomaly, the 
predetermined threshold, routing systems in the path of the anomaly, as well as the magnitude 
or seventy of the anomaly. The alert message is communicated to zone controller 24 to enable 
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the zone controller 24 to farther process the alert message and to enable the zone controller 24 
to communicate the alert message to other Zones U, V, W, X, Y and Z and/or ISPs 14b and 14c. 



Zone Controller 

Referring further to Figure 7, in one embodiment the zone controller 24 includes a 
correlator 24a coupled to the router monitor 20. The correlator 24a includes a communication 
interface adapter 24e. The zone controller 24 further includes an alert message database 24b, 
which is coupled to the correlator module 24a. A web server 24c and access scripts software 
24d are also defined on the controller 24. 

The zone controller 24 is adapted to receive a plurality of alert messages from the router 
monitor 20, and to process the alert messages by correlating the alert messages based on the 
pertinent information related to the control signal anomaly, as described above. The zone 
controller 24 of Zone X, as well as other various controller (not shown), which are included in 
the other various Zones U, V, W, Y and Z are similarly constructed and arranged as the 

1 5 controller 24 of Zone X. 

More precisely, the correlator 24a is adapted to receive and categorize the alert messages 
and to generate a number of tables including the categorized alert messages. The tables 
including the categorized alert messages are stored in the alert message database 24b, which is 
coupled to the correlator module 24a. The correlator module 24a is further adapted to compare 
20 the alert messages to determine if trends exist. One example of a trend can be a plurality of 
alert messages that are traceable through the network computer system 10 to a particular 
computer system 16. Another example of trend can be a plurality of alert messages that 
include similar characteristics. 

The communication interface adapter 24e operates to provide a communication interface 
25 to an external computer device 30, such as a notebook computer, desktop computer, server or 
personal digital assistant ("PDA") or other type of system. The personal computing device 30 
can be adapted to run network management interface software 30a, such as HP OpenView, 
available from Hewlett-Packard Company of Palo Alto, California. The network management 
interface software 30a is adapted to interface with the alert message database 24b and to 
30 provide a graphical user interface ("GUI") on the display 30b of the computing device 30. 
Thereafter, a network administrator can view and respond to the alert messages. 
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Alternatively, the personal computing device 30 can include a conventional web 
browser 30c, which is similarly adapted to interface with the alert message database 24b, via a 
web server 24c and access scripts module 24d, to provide a graphical user interface ("GUI") on 
the display 30b of the computing device 30. Similar to that described above, the network 

5 administrator can view and respond to the alert messages. 

After detection of the anomalous control signal, the controller 24 blocks anomalous 
control signals as close to their source as possible. By taking a global view of the ISP 
computer networks 14a, 14b and 14c, the controller 24 is able to coordinate the configuration of 
the routing systems 22, 22b and/or 22c to filter certain types of traffic by employing either 
■ 10 custom filtering hardware (not shown) or filtering mechanisms included in the routing systems. 

Referring again to Figure 5, in one specific example, a malicious control signal attack 
from a computer system 17 located in Zone U of ISP computer network 14b to one specific 
computer system 16a of Zone X can be detected, tracked and blocked by the system 5 
according to one embodiment of the present invention. 

15 In this example, the malicious control signal attack executed by the computer system 17 

includes a BGP message, which is an Internet Protcol ("IP") packet containing various data 
indicating that computer system 17 can provide computer system 16a with the most efficient 
route to a particular destination computer (not shown) for enabling computer system 16a to 
obtain information (e.g., a requested web page). 

20 Referring further to Figure 8, the specific trajectory of the malicious control signal 

attack from the computer system 1 7 of Zone U located on ISP 14b to computer system 1 6a of 
Zone X located on ISP 14a is illustrated by the control signal attack path 200. The control 
signal attack path 200 commences at the attacking computer system 17 and extends through the 
routing system 22d, through the router monitor 20c, through the controller 24b, through the 

25 computer network 1 8, through the controller 24, through the router monitor 20, through the 
routing system 22 and to the targeted computer system 16a. 

After IP packets associated with the malicious control signal attack flow through the 
routing system 22, the routing system 22 generates a control signal and related information 
packet, which is exported to the router monitor 20. The control signal and related information 

30 packet describes the data path or traffic flow characteristics between computer system 1 7 
(control signal attacker) and the computer system 16a (target of control signal attack). The 
control signal attack can be represented as the computer system 16a receiving an unusual data 
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path instruction from the computer system 1 7. This anomalous control signal traffic can be 
detected at the router monitor 20 and an alert message is communicated to the controller 24. 

Referring again to Figure 6, during the above described example of a control signal 
attack, the collector 20a located on the router monitor 20 receives and processes or normalizes 

5 the control signals and related control signal information to generate records. The collector 20a 
stores a copy of the records in the persistent archive 20b. The collector 20a also provides a 
copy of the records to the detector 20c and to the profiler 20d. The detector 20c analyzes the 
records and detects anomalous traffic. In this example, the detector 20c detects the pattern of 
records as a control signal attack, because attributes associated with the records exceeds a 
1 io predeteimmed threshold and statistical models defined on the detector 20c. 

After the detector generates an alert, the controller may send a signal to the refiner to 
gather specific details about the control signal anomaly. Examples of this anomaly detail may 
include all control messages describing a path to a specific web destination, or all signaled 
topology changes received by network device 20 during a specified time period. The detailed 

1 5 information is forwarded to zone controller 24. 

Refiner 

Figure 9 shows a more detailed operation of refiner 20h according to one embodiment of 
the invention. According to one embodiment of the invention, an instance (e.g., a software 

20 process executing in a memory of a computer system) is created that responds to queries 

directed to one or more databases (e.g., database 20b, 20g, etc.). For example, there may be a 
process, referred to as BGPRefiner that performs queries regarding BGP routing data. In 
response to an automated local controller or management station query for detailed information 
on the signal anomaly, the local controller invokes a new instance of BGPRefiner. Within the 

25 BGPRefiner process, the query engine consults local index of data available in the persistent 
archive to determine what information is needed to respond to query. The query engine then 
instructs the loader to retrieve the needed data from the persistent archive. The loader 
reassembles the retrieved data in the memory image to recreate specific elements of the 
signaled anomaly, or recreate the network topology at the time of the signal anomaly. 

30 In the control signal attack example described earlier, the correlator 24a located on the 

controller 24 sends a simple network management protocol ("SNMP") alert message (e.g., an 
SNMP trap message) to the network management interface 30a located on the personal 
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computing device 30. This alert message notifies the network administrator and/or security 
operators as to the presence of the control signal attack. Included in this alert message is the 
network address, such as the universal resource locator ("URL") that describes the anomaly's 
location in the database 24b of the controller 24. The network management interface 30a can 

5 share the URL associated with the control signal attack with the web browser 30c also located 
on the personal computing device 30. The browser 30c can use a hyper text transfer protocol 
("HTTP") type transfer using the URL to visualize the statistics related to the control signal 
attack, and to generate filter entries (e.g., by implementing one or more ACLs, for example), 
and/or rate limiting parameters (e.g., committed access rate (CAR) parameters) for remediation 
' 10 of the control signal attack. When the web server 24c receives the URL from the browser 30c, 
the web server 24c invokes server-side access scripts 24d, which generates queries to the 
database 24b for generating a dynamic HTML web page. The network administrator and/or 
security operators can view the control signal anomalies on the web page, which is displayed 
on the display 30b of the computing device 30. 

15 Although not shown, in an embodiment, the system 5 for managing network topology 

signal information and monitoring, detecting, tracking and blocking control signal anomalies 
communicated between routers located in various ISPs of a wide area network can be located 
on a computer-readable medium (e.g., a storage medium, such as an optical or magnetic disk, 
in a memory of a computer system or controller, or any other type of medium). The storage 

20 medium can be transported and selectively loaded onto the routing systems 22, 22b and/or 22c. 
Alternatively, the system 5 for monitoring, detecting, tracking and/or blocking control signal 
anomalies communicated between routers located in various ISPs of a wide area network can 
be partially located on the routing systems 22, 22b and/or 22c and partially located on other 
servers (not shown), or may be located on one or more systems separate from routing systems 

25 22, 22b, and 22c. For example, the router monitor 20 can be located on routing system 22 and 
the router monitor 20b can be located on routing system 22c. Further, zone controller 24 can 
be co-located with either the router monitor 20, the router monitor 20b or zone controller 24 
can be located on another server (not shown). It should be appreciated that various aspects of 
the invention may be implemented in any location within the computer network, and the 

30 invention is not limited to any particular location. 
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Ouerv Facility 

As discussed above, a query facility may be provided to allow users to access data in 
the routing database. This query facility may be provided, for example, by refiner 20h. More 
particularly, the query facility may process the data stored in the routing database to perform 
5 useful queries, allowing users to analyze current and past network routing behavior in ways 
which were not possible with previous techniques. Using the query facility, users can perform 
data mining of the historical routing data, analyze how routing topology and state evolve over 
time, and view the routing topology and state for any arbitrary moment in time, including the 
current state. Users can also track network instability and interesting or anomalous routing 
10 events (e.g., those occurring in a network routing protocol such as BGP, for example). The 
ability to perform queries greatly simplifies the identification, tracing, and remediation of 
network problems both inside the monitored, network as well as in other parts of the Internet. 
Such an ability also helps with planning for future network growth, as queries of the routing 
database can be used to identify stable and high-quality networks which would make good 
1 5 candidates for peering or from which to purchase transit or upstream service. 

For queries, a start and stop time can be specified. In this case, only results for the 
given time period are returned. These times may be specified, for example, using a natural 
language interface that can be as formal as a standard timestamp string ("12:05:03 March 5, 
2002"), or as informal as "two days ago". If a start time is not specified, then the system may 
20 assume a start time of the first data it collected. If no stop time is specified, the system may 
assume the current time. Other time specification options may be available for certain query 
types, described below. Queries, according to one embodiment of the invention, can be 
performed by either a user, or by another component of the system (such as a statistical 
modeling engine described in more detail below) that uses the data to provide other services. 

25 

Types of Queries 
Routing Topology Queries 

The query facility supports several types of routing topology queries. According to one 
embodiment, the query facility can recreate the entire routing table for any point in history up 
30 to and including the current time. According to another embodiment, the query facility can 
provide a list of all route and topology changes between two arbitrary points in time. Also, 
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according to yet another embodiment, the query facility can provide an aggregated change 
summary -- that is, the net change in routing topology and state between any two points in time. 

In addition to the time specification options described above, routing topology queries 
can also specify a request for the current state of the routing topology. In this case, all contents 
5 of the current routing table are returned that match any other supplied parameters, described 
below. If the query is for a routing table dump for a given point in time, then the start and stop 
times may be set to the requested time for the routing table dump. 

Queries can be specified to return all results, or only results matching a given set of 
parameters. Parameters that may be specified in a query include, for example: 
10 router / 

inferred peer (described below) 

exact default-free prefix 

routes for prefixes that include a given prefix 

routes for prefixes that are included by a given prefix 
1 5 routes matching an AS regular expression 

Any combination of the above parameters can be specified for each query. In addition, 
the user can specify exactly which attributes (e.g., BGP attributes) to be returned for each route 
or route change. Any combination of attributes may be selected for any query. According to 

20 one embodiment, BGP attributes included in the BGP specification are supported. 

As discussed above, one embodiment of the system identifies inferred peers. An 
inferred peer is a network-level peer (not a BGP peer) that is identified by the system. A 
network-level peer is a network that directly exchanges traffic and routes with the monitored 
network. Network level peers are inferred by examining the routing table from all of the 

25 monitored routers. If a router for a given AS, X, reports routes that include an AS path in 

which AS Y is the first AS (for iBGP (Interior BGP) peering sessions) or in which AS X is the 
first AS and AS Y is the second AS (for eBGP (Exterior BGP) peering sessions), then AS Y is 
inferred to be a peer of AS X. 



30 Statistic Queries 

Statistics kept by the statistic store can also be queried. As with routing topology 
queries, statistic queries can be specified for any type of statistical information that is kept, for 
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any time period for which there is data, and can be performed for all monitored data, a specific 
router, or a specific inferred peer. Because statistics queries are generally numeric data, the 
results may be returned to the user in both a tabular format and as a line graph of the data over 
the time period queried. 

5 

Example Routing Topology Query Procedure 

According to one embodiment of the invention, a query facility performs routing 
topology queries using the following algorithm. The query is given a start and stop time, as 
described above. The query facility (implemented by refiner 20h, for example) then performs 
10 the following steps to find the results of the query: 

1 . Establish routing state for start of query 

2. Set search start time to the latest of: the given query start time, the first system start 
time, or if the query is for a single router, when peering was established 

15 3.' Find most recent routing snapshot prior to the search start tune, and load it as the initial 
routing table. If no dump exists prior to the search start time, set initial routing table to be 
empty. 

4. Starting at the time of the snapshot, or at the search start time if the initial state is 
empty, apply all changes from the change store in order to the initial routing table, starting with 

20 the first change after the snapshot was taken, up to the search start time. 

5. The resulting routing table at the end of step 3 above is the routing table for the start of 

the query. 

6. Return all entries from the initial routing table that match the specified query 
parameters. 

25 7. If the start time is "now", or the current time, then stop. 

8. If the query is for a list of route changes, then starting with the search start time, output 
all changes from the change store in order that match the query parameters, ending with the 
supplied end time, or the most recent change received, whichever is earlier. 

9. If the query is for an aggregated change summary, save initial routing table. Then apply 
30 all changes from the change store to the initial routing table, up to the specified end time or the 

latest change, whichever is earlier. Return the difference between the initial routing table and 
the final routing table. 
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Statistical Modeling Engine 

In addition to the raw data on number and type of changes that occur, the system may 
build a statistical model of the observed network behavior, which it uses to help identify 
routing and topology problems. These statistics are calculated based on the observed routing 
topology and the change history. These statistics may be calculated for example, by a 
statistical modeling engine of profiler 20d of monitor 20. These statistics may be provided to 
detector 20c to assist in detecting anomaly conditions. There are several classes of statistics 
that may be observed, and which serve as the basis for the anomaly detection carried out by 
detector 20c, described above. Detector 20c may include an anomaly detection engine (not 
shown) that perfoms anomaly detection fucntions using several types of statistics described 
below. 

BGP Update Statistics 

15 The statistical modeling engine models may model, for example, BGP update statistics 

as discussed above. As with collector 20a described above, statistical models may be built for 
each of these types of routing changes on a network- wide, per-router, and per-inferred-peer 
basis. 

The statistical model for each type of event may include a set of averages/standard 
20 deviation pairs, one for each of several timeframes. For example, one average and standard 
deviation is calculated and updated for every 5-minute sample that is taken. Another may be 
calculated and continually updated separately for each 5-minute time period in a single day 
(e.g. all 5-minute samples from 8: 10 am ever)' day may be averaged together.) Finally, one is 
calculated and continually updated for each 5-minute sample period over the course of an entire 
25 week. These values are used by the anomaly detection engine to detect certain types of 
anomalous routing instability, as described below. 

Route Distribution Statistics 

According to one embodiment of the invention, the system may model Internet 
30 topology and path characteristic probabilities from the local BGP domain's perspective. For 
example, the system (e.g., system 5) calculates probabilities based on long-term historical 
behavior inferred from the monitored BGP information. This information includes the tens of 
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thousands of BGP update messages that BGP backbone routers typically receive in the Internet 
each day. 

As discussed above, the system may calculate probabilities of individual ASPath 
adjacencies, path origination from a given AS, and path selection over moving window 
5 timeframes. For example, the system may observe, for example, that the address block for a 
particular website (for example, www.website.com) originates from both a BGP router in 
autonomous system for Website Company, Inc and Website Company Subsidiary, Inc. In this 
example, this is an Internet site critical to a business' workflow. 

Over the course of several months, the path remains stable from both of these 
,0 companies. For this given prefix (and similarly for the hundreds of thousands of other Internet 
address blocks), the system calculates and expected range of path characteristics (probabilities). 
For any significant change in path characteristics, the system will automatically generate 
anomalies per user-configured alerting behaviors (see next section). So, if the Internet path to 
www.website.com suddenly changes to originate from an previously unknown AS, a alert may 
15 be generated. 

Anomaly Detection Engine 

The anomaly detection engine uses the statistical models developed by the statistical 
modeling engine to detect anomalous network behavior, and take appropriate action based on 
20 the severity of the anomaly. Anomalies based on configurable static thresholds may also be 
perfonned. As with the statistical models described above, there are several types of anomalies 
that can be detected. 



25 



30 



Instability Anomalies 

Instability anomalies may be detected, for example, based on the BGP update statistical 
models calculated by the statistical modeling engine. They may be referred to as instability 
anomalies because an anomalous level of BGP changes (e.g., frequent routing changes) is a 
sign of network and routing instability. Detecting instability anomalies gives a network 
operator a head start on detecting and solving network problems, rather than waiting until 
network services are compromised. One advantage of detecting anomalies according to 
; aspects of the invention is that not only do the detections indicate to the network 
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operator that a problem is occurring, they give a precise description of where the problem is 
occurring and what exactly is happening. 

Instability Anomaly Detection 
5 Instability anomalies may be detected based, for example, on a comparison of the most 

recent measurement for a given type of data and a weighted combination of the averages and 
standard deviations that have been calculated for the given data. 

Event Notification Service 

10 The system may also provide an event notification service that allows users to specify 

network events or routing and topology state changes that they wish to be notified of when they 
occur. This event notification service may be provided, for example, by controller 24. Events 
can be specified using the same parameters used when specifying a query. However, rather 
than returning all state matching the query parameters, the system may instead send a 

15 notification whenever the associated state changes. Notifications can be sent, for example, 
using email, remote syslog, or by sending an SNMP trap to a network address. 

An event notification service is very useful for detecting changes to correct routes, and 
for being alerted to changes in network connectivity. This service, according to one 
embodiment of the invention, may provide robust early warning system for network problems. 

20 Hie event notification service is also used, for example, by the anomaly detection 

engine to notify users about detected anomalies. 

Remote Monitoring Service 

An important part of network routing topology analysis is determining what view other 

25 networks have of the routing topology. When a network announces its own routes to other 
networks (e.g., to the rest of the Internet), they can be manipulated or corrupted by other 
networks that propagate them. This can cause other networks to have an incorrect view of how 
to reach that network. Even worse, other networks can either intentionally or accidentally 
hijack or alter those routes, causing some or all of the network (e.g., the Internet) to be unable 

30 to reach the original network, and perhaps even causing traffic for that network to be diverted 
to another location entirely. The original network does not perceive these incorrect route 
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announcements directly, because they occur in other parts of the Internet and the incorrect route 
announcements are not sent to it directly. 

To help address this problem, a system according to one embodiment of the invention 
may be capable of importing information from remote routers that are not being directly 
monitored This capability may be provided, for example, by remote monitoring. This remote 
monitoring may include placing one or more monitors 20 at diverse points within the Internet 
topology, and monitoring one or more routers at each point. According to one aspect of the 
invention, the system may monitor as many different networks (ASes) as possible, to achieve 
the widest possible view of the variation in Internet routing topology at different points in the 
Internet. This iet of monitors form the infrastructure for the remote monitoring service. They 
may be, for example, administered by a third party, and made publicly available to users of the 
system for the purpose of querying against remote views of the network topology. 



Remote Queries 

To use the remote monitoring service, a user simply accesses the user interface on their 
local monitor 20, just as he or she would do in normal use (for example, to execute a query). 
They can then specify a query as described in the section on the query facility. In addition, they 
can specify the source of information for that query. This can be any set of remote monitors 20 
that are available. Each monitor 20 provides the network-wide view of the network it is 
monitoring, and all queries are made against only that view, as opposed to the per-router and 
per-inferred-peer data that can be accessed on the monitor 20. The other network views may 
not be provided to a user to protect security and confidentiality of the remote network, not due 
to technical reasons. The system may be capable of querying and displaying that information 
in a similar manner as discussed above. 

One useful use of this remote query facility is to determine how other networks view 
the user's network. To determine this, two queries are made. In the first query, all remote 
collectors are queried for all routes containing the local network's AS number, using an AS 
regular expression query. In the second query, all remote collectors are queried for routes 
matching the local network's assigned IP address space. Together, these two queries provide a 
complete picture of how all remotely-monitored networks view the local network, and can be 
used to identify problems such as deaggregated or hijacked routes. 
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Locating Remote Monitors 

To perform remote queries, monitors 20 may be located for the purpose of executing. 
This can be performed by one of several methods. For example, a list can be maintained by the 
third party that maintains monitors, and which is downloaded and updated periodically. 
5 Alternatively, a multicast group can be formed, which monitors 20 would join and periodically 
send a message to a well-known multicast address to announce their presence. 

User Interface 

For user- visible features, such as the query facility, and the configuration and viewing 
10 of anomalies aAd traps, several different types of interfaces may be provided. For example, a 
command-line interface (CLI) may be provided that accepts simple text-based commands for 
accessing features. A graphical user interface (GUI) may, according to one embodiment of the 
invention, allow direct graphing and viewing of system data, as well as more elaborate output 
formats that are supported by the CLI. The CLI, may be, for example, a standard VT100 
15 compatible terminal interface accessible through the standard protocols telnet or ssh programs. 
The GUI may also use a standard web interface, with HTTPS for security, for accessing the 
system. Any standard web browser can be used to access the GUI with full security and all 
capabilities. It should be appreciated that any user interface may be used to view anomaly and 
other data. 

20 Having thus described at least one illustrative embodiment of the invention, various 

alterations, modifications and improvements will readily occur to those skilled in the art. Such 
alterations, modifications and improvements are intended to be within the scope and spirit of 
the invention. Accordingly, the foregoing description is by way of example only and is not 
intended as limiting. 

25 What is claimed is: 
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1. A method for monitoring control signal traffic over a computer network comprising a 
plurality of network communication systems by a computer system, the method comprising 
acts of: 

5 receiving, from at least one of the plurality of network communication systems, at least 

one control signal communicated to one or more other network communication systems; and 
storing the at least one control signal in a database of the computer system. 

2. The method according to claim 1, wherein the at least one control signal controls 
1 o forwarding of data in the computer network. 

3. The method according to claim 1, wherein the at least one control signal is a route entry 
stored in a memory of the at least one of the plurality of network communication systems. 



15 



20 



4. The method according to claim 1 , wherein the at least one control signal is a route 
update transmitted by the at least one of the plurality of network communication systems. 

5. The method according to claim 1 , further comprising an act of determining, based on 
the at least one control signal, an anomaly in the computer network. 

6. The method according to claim 5, further comprising an act of generating an alert signal 
based on the determined anomaly. 

7. The method according to claim 5, wherein the act of storing further comprises storing a 
25 plurality of control signals over time. 

8. The method according to claim 5, further comprising an act of performing, in response 
to the act of determining the anomaly, an administrative act in the computer network. 

30 9. The method according to claim 8, wherein the anomaly includes one or more attributes 
and the method further comprises an act of tracking the one or more attributes of the anomaly 
to at least one source. 
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10. The method according to claim 9, further comprising an act of filtering a control signal 
produced by the at least one source that relates to the anomaly. 

5 11. The method according to claim 9, wherein the act of filtering is performed in one of the 
plurality of network communication systems. 

12. The method according to claim 1 1 , wherein the one of the plurality of network 
communication, systems is a router. 

10 / 

13. The method according to claim 12, further comprising an act of creating a filter in the 
router to filter control data transmitted by the at least one source. 

14. An apparatus for monitoring control signal traffic over a computer network comprising 
15 a plurality of network communication systems, the apparatus comprising: 

a monitor that receives, from at least one of the plurality of communications systems, at 
least one control signal communicated to one or more other network communications systems 
and which stores the at least one control signal in a database. 

20 15. The apparatus according to claim 14, further comprising a controller that receives, from 
the monitor, the at least one control signal and stores the at least one control signal in the 
database. 

1 6. The apparatus according to claim 1 4, wherein the monitor stores the at least one control 
25 signal in a persistent archive. 

17. The apparatus according to claim 14, further comprising a detector that detects an 
anomaly based on the at least one control signal. 

30 1 8. The apparatus according to claim 14, further comprising a profiler that generates a 

profile of at least one of network communication trends in the computer network and topology 
of the computer network. 
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19. The apparatus according to claim 17, further comprising a controller that is adapted to 
receive the detected anomaly from the detector, and is adapted to communicate the anomaly in 
an alert message. 

20. A computer-readable medium encoded with instructions for execution on a computer 
system, the instructions when executed, perform a method comprising acts of: 

receiving, from at least one of the plurality of network communication systems, at least 
one control signal communicated to one or more other network communication systems; and 
storing the at least one control signal in a database of the computer system. 

21 The computer-readable medium according to claim 20, wherein the at least one control 
signal controls forwarding of data in the computer network. 

22. The computer-readable medium according to claim 20, wherein the at least one control 
signal is a route entry stored in a memory of the at least one of the plurality of network 

23. The computer-readable medium according to claim 20, wherein the at least one control 
signal is a route update transmitted by the at least one of the plurality of network 
communication systems. 

24. The computer-readable medium according to claim 20, the method further comprising 
an act of determining, based on the at least one control signal, an anomaly in the computer 
network. 

25 . The computer-readable medium according to claim 24, the method further comprising 
an act of generating an alert signal based on the determined anomaly. 

26. The computer-readable medium according to claim 24, wherein the act of storing 
further comprises storing a plurality of control signals overtime. 
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27. The computer-readable medium according to claim 24, the method further comprising 
an act of performing, in response to the act of determining the anomaly, an administrative act in 
the computer network. 

5 28. The computer-readable medium according to claim 27, wherein the anomaly includes 
one or more attributes and the method further comprises an act of tracking the one or more 
attributes of the anomaly to at least one source. 

29. The computer-readable medium according to claim 28, the method further comprising 
10 an act of filtering a control signal produced by the at least one source that relates to the 

anomaly. 

30. The computer-readable medium according to claim 28, wherein the act of filtering is 
performed in one of the plurality of network communication systems. 

15 

3 1 . The computer-readable medium according to claim 30, wherein the one of the plurality 
of network communication systems is a router. 

32. The computer-readable medium according to claim 3 1, the method further comprising 
20 an act of creating a filter in the router to filter control data transmitted by the at least one 

source. 
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